Estimation of Secure Data Deduplication in Big Data
نویسنده
چکیده
Bigdata is linked with the entireties of composite data sets. In bigdata environment, data is in the form of unstructured data and may contain number of duplicate copies of same data. To manage such a complex unstructured data hadoop is to be used. A hadoop is an open source platform specially designed for bigdata environment. Hadoop can handle unstructured data very efficiently as compare to tradition data processing tools. To reduce duplicity of data concept of deduplication is used. In this paper an evaluation of different chunking and deduplication techniques has been presented.
منابع مشابه
A Review Paper on Hybrid Cloud Approach for Secure Authorized Data Deduplication
Cloud computing is best concept to handle big database as the world is moving towards digitization. The amount of digital data in the world is growing exponentially with time. Thus, employing storage optimization techniques is an essential requirement to large storage areas like cloud storage. Cloud computing is best concept to handle big datasets. Data de the best storage optimization techniqu...
متن کاملA Dynamic Deduplication Approach for Big Data Storage
As data is increasing every day, so it is very challenging task to manage storage devices for this explosive growth of digital data. Data reduction has become very crucial problem. Deduplication approach plays a vital role to remove redundancy in large scale cluster computing storage. As a result, deduplication provides better storage utilization by eliminating redundant copies of data and savi...
متن کاملGuest Editorial on Advances in Tools and Techniques for Enabling Cyber-Physical-Social Systems - Part II
P ART II of the IEEE Transactions on Computational Social Systems Special Issue on Cyber–Physical–Social Systems (CPSS) includes six papers that are on emerging techniques for radio access networks, data deduplication, big data computing, smart community, cloud computing, and Internet of Things. The paper “QoE-Guaranteed and Power-Efficient Network Operation for Cloud Radio Access Network with ...
متن کاملDdup - towards a deduplication framework utilising apache spark
This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...
متن کاملBoafft: Distributed Deduplication for Big Data Storage in the Cloud
As data progressively grows within data centers, the cloud storage systems continuously facechallenges in saving storage capacity and providing capabilities necessary to move big data within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system with distributed deduplication. The Boafft achieves scalable throughput and capacity usingmultiple data servers to dedu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017